Classification of Landsat TM Data
Classification refers to several statistical techniques (or algorithms) used to sort and group data into discrete classes which can be uniquely identified. Classification algorithms are routinely used to reduce the large volume of data present in a typical TM (or other sensor) dataset to several classes that are meaningful to the investigator. There are two major types of classification algorithm applied to remotely sensed data: unsupervised and supervised. Unsupervised classification algorithms (such as ISODATA) cluster data according to several user-defined statistical parameters in an iterative fashion until either some percentage of pixels remain unchanged or a maximum number of iterations has been performed. This method of classification is most useful when no previous knowledge or ground truth data of an area is available. The classes determined by the algorithm still require land cover identification by an experienced analyst however, which can be a significant disadvantage to the method.
The unsupervised classification technique is used for portions of the CAP LTER region for which limited ancillary data (land use, aerial photographs, zoning maps) are available (essentially the region outside of Maricopa County). Twenty (or more) classes are determined for each TM scene using the ISODATA algorithm, and these classes are assigned a land cover category on the basis of spectral signature, vegetation density, and geologic setting. These land cover classifications are considered to be preliminary pending detailed verification by ground truthing.
Supervised classification algorithms rely on user-defined training regions that represent pure samples of a particular class (such as asphalt ). Several different types of supervised classification algorithms exist, but the major types are minimum distance, parallelpiped, and maximum likelihood. Minimum distance algorithms calculate a mean value in spectral space for each training region, and then compare each image pixel value to these means. The image pixel is assigned to whatever class mean it is closest in value to. The parallelpiped algorithm constructs a class volume in data space to further constrain the identification of data points as a given class. Maximum likelihood algorithms assume a Gaussian distribution of pixel values within each training class and tend to be somewhat more accurate in regions of high surficial variability. Image pixels that fall within some standard deviation of the training class mean are assigned to that class. This method has the added advantage of weighting such that image pixels are less likely to be classified as covers with low probability of occurrence in the scene. Maximum likelihood supervised classification is used for regions that have good ground truth data available, such as the Phoenix metro region.
Verification of Classification Accuracy
The accuracy of any classification must be assessed prior to use in scientific analysis. Accuracy assessment involves the collection of ground truth data for the classified scene. This is done by establishing a number of test pixels within the image for which the actual ground cover is determined by field inspection, use of aerial photographs, or use of some other dataset. An overall classification accuracy is then determined by dividing the number of test pixels correctly classified by the number of total test pixels.
Assessment of accuracy for the CAP LTER study area is not straightforward as the major dataset available for comparison is the Maricopa Association of Governments (MAG) Land Use Map. Land use can be thought of as the purpose to which a particular area is devoted, such industrial or recreational use. Use of the MAG dataset for comparison with classified TM data requires interpretation of the land cover types in terms of possible land uses, which is complicated as several land covers may be associated with the same land use category. For example, the institutional MAG land use category corresponds to the asphalt+concrete+soil+/-metal roofs, asphalt+concrete+soil+/-metal roofs+/-grass, and concrete+grass+/-woody veg.+/-asphalt land cover classes. Land use is the primary data format used by the various LTER researchers, so a fairly simple categorization scheme was defined for use in LTER studies. Overall accuracy of the TM classification is 71%, a value which is typical for TM data.